1 Pasilla vignette: 20170820

2 Example hpgltool usage with a real data set (pasilla)

In this document, I am hoping to mostly copy/paste material from the tests/ tree and explain the various functionalities therein. It is my hope therefore to step from data loading all the way through ontology searching with appropriate visualizations at each stage.

3 Load Data

In test_01load_data.R I perform load some data into an expressionset and get ready to play with it.

## I use sm to keep functions from printing too much (well, anything really)
tt <- sm(library(hpgltools))
tt <- sm(library(pasilla))
tt <- sm(data(pasillaGenes))

3.1 Gather annotation data

biomart is an excellent resource for annotation data, but it is entirely too complex. The following function ‘get_biomart_annotations()’ attempts to make that relatively simple.

## Try loading some annotation information for this species.
gene_info_lst <- sm(load_biomart_annotations(species="dmelanogaster",
                                             host="useast.ensembl.org"))
gene_info <- gene_info_lst[["annotation"]]
info_idx <- gene_info[["gene_biotype"]] == "protein_coding"
gene_info <- gene_info[info_idx, ]
rownames(gene_info) <- make.names(gene_info[["ensembl_gene_id"]], unique=TRUE)
head(gene_info)
##             ensembl_transcript_id ensembl_gene_id description   gene_biotype
## FBgn0260439           FBtr0005088     FBgn0260439          NA protein_coding
## FBgn0000056           FBtr0006151     FBgn0000056          NA protein_coding
## FBgn0031085           FBtr0070002     FBgn0031085          NA protein_coding
## FBgn0062565           FBtr0070003     FBgn0062565          NA protein_coding
## FBgn0031089           FBtr0070006     FBgn0031089          NA protein_coding
## FBgn0031094           FBtr0070008     FBgn0031094          NA protein_coding
##             cds_length chromosome_name strand start_position end_position
## FBgn0260439       1776              2L      +        8366038      8370090
## FBgn0000056        819              2L      +       14615552     14618902
## FBgn0031085        633               X      +       20051294     20052519
## FBgn0062565       1164               X      +       20094398     20095767
## FBgn0031089       1326               X      +       20148124     20155514
## FBgn0031094        819               X      +       20170222     20171526

3.2 Load count tables

The pasilla data set provides count tables in a tab separated file, let us read them into an expressionset in the following block along with creating an experimental design. create_expt() will then merge the annotations, experimental design, and count tables into an expressionset.

## This section is copy/pasted to all of these tests, that is dumb.
datafile <- system.file("extdata/pasilla_gene_counts.tsv", package="pasilla")
## Load the counts and drop super-low counts genes
counts <- read.table(datafile, header=TRUE, row.names=1)
counts <- counts[rowSums(counts) > ncol(counts),]
## Set up a quick design to be used by cbcbSEQ and hpgltools
design <- data.frame(row.names=colnames(counts),
    condition=c("untreated","untreated","untreated",
        "untreated","treated","treated","treated"),
    libType=c("single_end","single_end","paired_end",
        "paired_end","single_end","paired_end","paired_end"))
metadata <- design
colnames(metadata) <- c("condition", "batch")
metadata[["sampleid"]] <- rownames(metadata)

## Make sure it is still possible to create an expt
pasilla_expt <- sm(create_expt(count_dataframe=counts, metadata=metadata,
                               savefile="pasilla", gene_info=gene_info))

The rest of test_01load_data.R checks the various slots of the resulting expt to ensure that important stuff for future analyses are available, primarily: condition/batch, library sizes, annotations, counts.

4 Graph metrics

The next set of tests seek to ensure that the various plots used to visualize and understand trends in the data are maintained over time.

In this first block I will use a single function graph_metrics() to plot them all. And then follow up with the one at a time. Many functions in hpgltools are quite chatty with liberal usage of message(), as a result I will sm() this call to shut it up.

pasilla_metrics <- sm(graph_metrics(pasilla_expt, ma=TRUE, qq=TRUE))
summary(pasilla_metrics)
##                 Length Class        Mode   
## boxplot          9     gg           list   
## corheat          3     recordedplot list   
## cvplot           9     gg           list   
## density         10     gg           list   
## density_table    5     data.table   list   
## disheat          3     recordedplot list   
## gene_heatmap     0     -none-       NULL   
## legend           3     recordedplot list   
## legend_colors    3     data.frame   list   
## libsize          9     gg           list   
## libsizes         4     data.table   list   
## libsize_summary  7     data.table   list   
## ma              21     -none-       list   
## nonzero          9     gg           list   
## nonzero_table    7     data.frame   list   
## pc_loadplot      3     recordedplot list   
## pc_summary       4     data.frame   list   
## pc_propvar       6     -none-       numeric
## pc_plot          9     gg           list   
## pc_table        14     data.frame   list   
## qqlog            3     recordedplot list   
## qqrat            3     recordedplot list   
## smc              9     gg           list   
## smd              9     gg           list   
## topnplot         9     gg           list   
## tsne_summary     4     data.frame   list   
## tsne_propvar    20     -none-       numeric
## tsne_plot        9     gg           list   
## tsne_table      10     data.frame   list

Now let us print the graphs

pasilla_metrics$libsize

## The library sizes range from 8-21 million reads, this might be a problem for
## some analyses.
pasilla_metrics$nonzero

## Ergo, the lower abundance libraries have more genes of counts == 0 (bottom
## left).
pasilla_metrics$boxplot

## And a boxplot downshifts them (but not that much because it decided to put
## the data on the log scale).
pasilla_metrics$density

## Similarly, one can see those samples are a bit lower with respect to density

## Unless the data is very well behaved, the rest of the plots are not likely to
## look good until the data is normalized, nonetheless, lets see
pasilla_metrics$corheat

pasilla_metrics$disheat

pasilla_metrics$pc_plot

## So the above 3 plots are pretty much the worst case scenario for this data.

5 Normalize and replot

The most common normalization suggested by Najib is a cpm(quantile(filter(data))). On top of that we often do log2() and/or a batch adjustment. default_norm() quietly does the first and may be supplemented with other arguments.

norm <- default_norm(pasilla_expt, transform="log2")
## This function will replace the expt$expressionset slot with:
## log2(cpm(quant(cbcb(data))))
## It will save copies of each step along the way
##  in expt$normalized with the corresponding libsizes. Keep libsizes in mind
##  when invoking limma.  The appropriate libsize is non-log(cpm(normalized)).
##  This is most likely kept at:
##  'new_expt$normalized$intermediate_counts$normalization$libsizes'
##  A copy of this may also be found at:
##  new_expt$best_libsize
## Not correcting the count-data for batch effects.  If batch is
##  included in EdgerR/limma's model, then this is probably wise; but in extreme
##  batch effects this is a good parameter to play with.
## Step 1: performing count filter with option: cbcb
## Removing 2622 low-count genes (7531 remaining).
## Step 2: normalizing the data with quant.
## Step 3: converting the data with cpm.
## Step 4: transforming the data with log2.
## Step 5: not doing batch correction.
norm_metrics <- graph_metrics(norm)
## Graphing number of non-zero genes with respect to CPM by library.
## Graphing library sizes.
## Graphing a boxplot.
## Graphing a correlation heatmap.
## Graphing a standard median correlation.
## Performing correlation.
## Graphing a distance heatmap.
## Graphing a standard median distance.
## Performing distance.
## Graphing a PCA plot.
## Graphing a T-SNE plot.
## Plotting a density plot.
## Plotting a CV plot.
## Naively calculating coefficient of variation/dispersion with respect to condition.
## Finished calculating dispersion estimates.
## Plotting the representation of the top-n genes.
## Plotting the expression of the top-n PC loaded genes.
## Printing a color to condition legend.
norm_metrics$corheat

norm_metrics$smc

norm_metrics$disheat

norm_metrics$smd

## some samples look a little troublesome here.
norm_metrics$pc_plot

6 Try a pairwise comparison

pasilla_pairwise <- sm(all_pairwise(pasilla_expt))
pasilla_tables <- sm(combine_de_tables(
  pasilla_pairwise,
  excel="pasilla_tables.xlsx"))
pasilla_sig <- sm(extract_significant_genes(
  pasilla_tables,
  excel="pasilla_sig.xlsx"))
pasilla_ab <- sm(extract_abundant_genes(
  pasilla_pairwise,
  excel="pasilla_abundant.xlsx"))
pasilla_tables$plots[[1]][["deseq_ma_plots"]]$plot

pasilla_tables$plots[[1]][["edger_ma_plots"]]$plot

pasilla_tables$plots[[1]][["limma_ma_plots"]]$plot

up_genes <- pasilla_sig[["deseq"]][["ups"]][[1]]
down_genes <- pasilla_sig[["deseq"]][["downs"]][[1]]
pasilla_go <- load_biomart_go(species="dmelanogaster")$go
## Finished downloading ensembl go annotations, saving to dmelanogaster_go_annotations.rda.
## Saving ontologies to dmelanogaster_go_annotations.rda.
## Finished save().
pasilla_length <- fData(pasilla_expt)[, c("ensembl_gene_id", "cds_length")]
colnames(pasilla_length) <- c("ID", "length")
pasilla_goseq <- simple_goseq(sig_genes=up_genes, go_db=pasilla_go, length_db=pasilla_length)
## Using the row names of your table.
## Found 104 genes out of 113 from the sig_genes in the go_db.
## Found 79 genes out of 113 from the sig_genes in the length_db.
## Using manually entered categories.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
## simple_goseq(): Calculating q-values
## simple_goseq(): Filling godata with terms, this is slow.
## Testing that go categories are defined.
## Removing undefined categories.
## Gathering synonyms.
## Gathering category definitions.
## simple_goseq(): Making pvalue plots for the ontologies.

pasilla_goseq$pvalue_plots$bpp_plot_over

pasilla_goseq <- simple_goseq(sig_genes=down_genes, go_db=pasilla_go, length_db=pasilla_length)
## Using the row names of your table.
## Found 99 genes out of 109 from the sig_genes in the go_db.
## Found 76 genes out of 109 from the sig_genes in the length_db.
## Using manually entered categories.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
## simple_goseq(): Calculating q-values
## simple_goseq(): Filling godata with terms, this is slow.
## Testing that go categories are defined.
## Removing undefined categories.
## Gathering synonyms.
## Gathering category definitions.
## simple_goseq(): Making pvalue plots for the ontologies.

pasilla_goseq$pvalue_plots$bpp_plot_over

test <- simple_goseq(sig_genes=names(pasilla_ab$abundances$deseq$treated), go_db=pasilla_go, length_db=pasilla_length)
## Found 188 genes out of 200 from the sig_genes in the go_db.
## Found 181 genes out of 200 from the sig_genes in the length_db.
## Using manually entered categories.
## Calculating the p-values...
## 'select()' returned 1:1 mapping between keys and columns
## simple_goseq(): Calculating q-values
## simple_goseq(): Filling godata with terms, this is slow.
## Testing that go categories are defined.
## Removing undefined categories.
## Gathering synonyms.
## Gathering category definitions.
## simple_goseq(): Making pvalue plots for the ontologies.

test$pvalue_plots$bpp_plot_over

pander::pander(sessionInfo())

R version 4.0.3 (2020-10-10)

Platform: x86_64-pc-linux-gnu (64-bit)

locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C

attached base packages: splines, stats4, parallel, stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pasilla(v.1.18.0), GO.db(v.3.12.1), AnnotationDbi(v.1.52.0), GOstats(v.2.56.0), edgeR(v.3.32.0), lme4(v.1.1-25), Matrix(v.1.2-18), BiocParallel(v.1.24.1), variancePartition(v.1.20.0), fission(v.1.10.0), ruv(v.0.9.7.1), SummarizedExperiment(v.1.20.0), GenomicRanges(v.1.42.0), GenomeInfoDb(v.1.26.1), IRanges(v.2.24.0), S4Vectors(v.0.28.0), MatrixGenerics(v.1.2.0), matrixStats(v.0.57.0), hpgltools(v.1.0), R6(v.2.5.0), Biobase(v.2.50.0) and BiocGenerics(v.0.36.0)

loaded via a namespace (and not attached): R.utils(v.2.10.1), tidyselect(v.1.1.0), RSQLite(v.2.2.1), htmlwidgets(v.1.5.2), grid(v.4.0.3), Rtsne(v.0.15), devtools(v.2.3.2), IHW(v.1.16.0), DESeq(v.1.39.0), munsell(v.0.5.0), codetools(v.0.2-18), preprocessCore(v.1.52.0), statmod(v.1.4.35), withr(v.2.3.0), colorspace(v.2.0-0), Category(v.2.56.0), highr(v.0.8), knitr(v.1.30), rstudioapi(v.0.13), Vennerable(v.3.1.0.9000), robustbase(v.0.93-6), genoPlotR(v.0.8.9), labeling(v.0.4.2), slam(v.0.1-47), GenomeInfoDbData(v.1.2.4), lpsymphony(v.1.18.0), topGO(v.2.42.0), bit64(v.4.0.5), farver(v.2.0.3), rprojroot(v.2.0.2), vctrs(v.0.3.5), generics(v.0.1.0), xfun(v.0.19), BiocFileCache(v.1.14.0), fastcluster(v.1.1.25), doParallel(v.1.0.16), locfit(v.1.5-9.4), bitops(v.1.0-6), DelayedArray(v.0.16.0), assertthat(v.0.2.1), scales(v.1.1.1), gtable(v.0.3.0), affy(v.1.68.0), sva(v.3.38.0), processx(v.3.4.4), rlang(v.0.4.8), genefilter(v.1.72.0), rtracklayer(v.1.50.0), lazyeval(v.0.2.2), selectr(v.0.4-2), BiocManager(v.1.30.10), yaml(v.2.2.1), reshape2(v.1.4.4), GenomicFeatures(v.1.42.1), crosstalk(v.1.1.0.1), qvalue(v.2.22.0), RBGL(v.1.66.0), tools(v.4.0.3), usethis(v.1.6.3), ggplot2(v.3.3.2), affyio(v.1.60.0), ellipsis(v.0.3.1), gplots(v.3.1.0), RColorBrewer(v.1.1-2), blockmodeling(v.1.0.0), sessioninfo(v.1.1.1), Rcpp(v.1.0.5), plyr(v.1.8.6), progress(v.1.2.2), zlibbioc(v.1.36.0), purrr(v.0.3.4), RCurl(v.1.98-1.2), BiasedUrn(v.1.07), ps(v.1.4.0), prettyunits(v.1.1.1), openssl(v.1.4.3), ggrepel(v.0.8.2), colorRamps(v.2.3), fs(v.1.5.0), magrittr(v.2.0.1), data.table(v.1.13.2), openxlsx(v.4.2.3), SparseM(v.1.78), goseq(v.1.42.0), pkgload(v.1.1.0), hms(v.0.5.3), evaluate(v.0.14), xtable(v.1.8-4), pbkrtest(v.0.4-8.6), XML(v.3.99-0.5), gridExtra(v.2.3), testthat(v.3.0.0), compiler(v.4.0.3), biomaRt(v.2.46.0), tibble(v.3.0.4), KernSmooth(v.2.23-18), crayon(v.1.3.4), minqa(v.1.2.4), R.oo(v.1.24.0), htmltools(v.0.5.0), mgcv(v.1.8-33), corpcor(v.1.6.9), tidyr(v.1.1.2), geneplotter(v.1.68.0), DBI(v.1.1.0), geneLenDataBase(v.1.26.0), dbplyr(v.2.0.0), MASS(v.7.3-53), rappdirs(v.0.3.1), boot(v.1.3-25), ade4(v.1.7-16), readr(v.1.4.0), cli(v.2.2.0), quadprog(v.1.5-8), R.methodsS3(v.1.8.1), pkgconfig(v.2.0.3), GenomicAlignments(v.1.26.0), plotly(v.4.9.2.1), xml2(v.1.3.2), foreach(v.1.5.1), annotate(v.1.68.0), XVector(v.0.30.0), AnnotationForge(v.1.32.0), rvest(v.0.3.6), EBSeq(v.1.30.0), stringr(v.1.4.0), callr(v.3.5.1), digest(v.0.6.27), graph(v.1.68.0), Biostrings(v.2.58.0), rmarkdown(v.2.5), GSEABase(v.1.52.0), directlabels(v.2020.6.17), curl(v.4.3), Rsamtools(v.2.6.0), gtools(v.3.8.2), nloptr(v.1.2.2.2), lifecycle(v.0.2.0), nlme(v.3.1-150), jsonlite(v.1.7.1), desc(v.1.2.0), viridisLite(v.0.3.0), askpass(v.1.1), limma(v.3.46.0), fansi(v.0.4.1), pillar(v.1.4.7), lattice(v.0.20-41), httr(v.1.4.2), DEoptimR(v.1.0-8), pkgbuild(v.1.1.0), survival(v.3.2-7), glue(v.1.4.2), remotes(v.2.2.0), zip(v.2.1.1), fdrtool(v.1.2.15), iterators(v.1.0.13), Rgraphviz(v.2.34.0), pander(v.0.6.3), bit(v.4.0.4), stringi(v.1.5.3), blob(v.1.2.1), DESeq2(v.1.30.0), caTools(v.1.18.0), memoise(v.1.1.0) and dplyr(v.1.0.2)

message(paste0("This is hpgltools commit: ", get_git_commit()))
## If you wish to reproduce this exact build of hpgltools, invoke the following:
## > git clone http://github.com/abelew/hpgltools.git
## > git reset a81cd6dcddd39e5f487fc75cd49a940253a6f2d3
## This is hpgltools commit: Thu Nov 19 13:46:12 2020 -0500: a81cd6dcddd39e5f487fc75cd49a940253a6f2d3
---
title: "hpgltools examples using pasilla"
author: "atb abelew@gmail.com"
date: "`r Sys.Date()`"
output:
 html_document:
  code_download: true
  code_folding: show
  fig_caption: true
  fig_height: 7
  fig_width: 7
  highlight: default
  keep_md: false
  mode: selfcontained
  number_sections: true
  self_contained: true
  theme: readable
  toc: true
  toc_float:
    collapsed: false
    smooth_scroll: false
vignette: >
  %\VignetteIndexEntry{d-04_pasilla}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

<style>
  body .main-container {
    max-width: 1600px;
  }
</style>

```{r options, include=FALSE}
library("hpgltools")
knitr::opts_knit$set(progress=TRUE,
                     verbose=TRUE,
                     width=90,
                     echo=TRUE)
knitr::opts_chunk$set(error=TRUE,
                      fig.width=8,
                      fig.height=8,
                      dpi=96)
old_options <- options(digits=4,
                       stringsAsFactors=FALSE,
                       knitr.duplicate.label="allow")
ggplot2::theme_set(ggplot2::theme_bw(base_size=10))
set.seed(1)
ver <- "20170820"
rmd_file <- "d-04_pasilla.Rmd"
```

# Pasilla vignette: `r ver`

# Example hpgltool usage with a real data set (pasilla)

In this document, I am hoping to mostly copy/paste material from the tests/ tree and explain the
various functionalities therein.  It is my hope therefore to step from data loading all the way
through ontology searching with appropriate visualizations at each stage.

# Load Data

In test_01load_data.R I perform load some data into an expressionset and get ready to play with it.

```{r load_data}
## I use sm to keep functions from printing too much (well, anything really)
tt <- sm(library(hpgltools))
tt <- sm(library(pasilla))
tt <- sm(data(pasillaGenes))
```

## Gather annotation data

biomart is an excellent resource for annotation data, but it is entirely too complex.
The following function 'get_biomart_annotations()' attempts to make that relatively simple.

```{r biomart}
## Try loading some annotation information for this species.
gene_info_lst <- sm(load_biomart_annotations(species="dmelanogaster",
                                             host="useast.ensembl.org"))
gene_info <- gene_info_lst[["annotation"]]
info_idx <- gene_info[["gene_biotype"]] == "protein_coding"
gene_info <- gene_info[info_idx, ]
rownames(gene_info) <- make.names(gene_info[["ensembl_gene_id"]], unique=TRUE)
head(gene_info)
```

## Load count tables

The pasilla data set provides count tables in a tab separated file, let us read them into an
expressionset in the following block along with creating an experimental design.  create_expt() will
then merge the annotations, experimental design, and count tables into an expressionset.

```{r load_counts}
## This section is copy/pasted to all of these tests, that is dumb.
datafile <- system.file("extdata/pasilla_gene_counts.tsv", package="pasilla")
## Load the counts and drop super-low counts genes
counts <- read.table(datafile, header=TRUE, row.names=1)
counts <- counts[rowSums(counts) > ncol(counts),]
## Set up a quick design to be used by cbcbSEQ and hpgltools
design <- data.frame(row.names=colnames(counts),
    condition=c("untreated","untreated","untreated",
        "untreated","treated","treated","treated"),
    libType=c("single_end","single_end","paired_end",
        "paired_end","single_end","paired_end","paired_end"))
metadata <- design
colnames(metadata) <- c("condition", "batch")
metadata[["sampleid"]] <- rownames(metadata)

## Make sure it is still possible to create an expt
pasilla_expt <- sm(create_expt(count_dataframe=counts, metadata=metadata,
                               savefile="pasilla", gene_info=gene_info))
```

The rest of test_01load_data.R checks the various slots of the resulting expt to ensure that
important stuff for future analyses are available, primarily: condition/batch, library sizes,
annotations, counts.

# Graph metrics

The next set of tests seek to ensure that the various plots used to visualize and understand trends
in the data are maintained over time.

In this first block I will use a single function graph_metrics() to plot them all.
And then follow up with the one at a time.  Many functions in hpgltools are quite chatty with
liberal usage of message(), as a result I will sm() this call to shut it up.

```{r graph_metrics, fig.show="hide"}
pasilla_metrics <- sm(graph_metrics(pasilla_expt, ma=TRUE, qq=TRUE))
summary(pasilla_metrics)
```

Now let us print the graphs

```{r print_graphs}
pasilla_metrics$libsize
## The library sizes range from 8-21 million reads, this might be a problem for
## some analyses.
pasilla_metrics$nonzero
## Ergo, the lower abundance libraries have more genes of counts == 0 (bottom
## left).
pasilla_metrics$boxplot
## And a boxplot downshifts them (but not that much because it decided to put
## the data on the log scale).
pasilla_metrics$density
## Similarly, one can see those samples are a bit lower with respect to density

## Unless the data is very well behaved, the rest of the plots are not likely to
## look good until the data is normalized, nonetheless, lets see
pasilla_metrics$corheat
pasilla_metrics$disheat
pasilla_metrics$pc_plot
## So the above 3 plots are pretty much the worst case scenario for this data.
```

# Normalize and replot

The most common normalization suggested by Najib is a cpm(quantile(filter(data))).
On top of that we often do log2() and/or a batch adjustment.
default_norm() quietly does the first and may be supplemented with other arguments.

```{r normalize, fig.show="hide"}
norm <- default_norm(pasilla_expt, transform="log2")
norm_metrics <- graph_metrics(norm)
```

```{r show_norm}
norm_metrics$corheat
norm_metrics$smc
norm_metrics$disheat
norm_metrics$smd
## some samples look a little troublesome here.
norm_metrics$pc_plot
```

# Try a pairwise comparison

```{r perform_pairwise, fig.show="hide"}
pasilla_pairwise <- sm(all_pairwise(pasilla_expt))
pasilla_tables <- sm(combine_de_tables(
  pasilla_pairwise,
  excel="pasilla_tables.xlsx"))
pasilla_sig <- sm(extract_significant_genes(
  pasilla_tables,
  excel="pasilla_sig.xlsx"))
pasilla_ab <- sm(extract_abundant_genes(
  pasilla_pairwise,
  excel="pasilla_abundant.xlsx"))
```

```{r de_pictures}
pasilla_tables$plots[[1]][["deseq_ma_plots"]]$plot
pasilla_tables$plots[[1]][["edger_ma_plots"]]$plot
pasilla_tables$plots[[1]][["limma_ma_plots"]]$plot
```

```{r goseq_test}
up_genes <- pasilla_sig[["deseq"]][["ups"]][[1]]
down_genes <- pasilla_sig[["deseq"]][["downs"]][[1]]
pasilla_go <- load_biomart_go(species="dmelanogaster")$go
pasilla_length <- fData(pasilla_expt)[, c("ensembl_gene_id", "cds_length")]
colnames(pasilla_length) <- c("ID", "length")
pasilla_goseq <- simple_goseq(sig_genes=up_genes, go_db=pasilla_go, length_db=pasilla_length)
pasilla_goseq$pvalue_plots$bpp_plot_over

pasilla_goseq <- simple_goseq(sig_genes=down_genes, go_db=pasilla_go, length_db=pasilla_length)
pasilla_goseq$pvalue_plots$bpp_plot_over

test <- simple_goseq(sig_genes=names(pasilla_ab$abundances$deseq$treated), go_db=pasilla_go, length_db=pasilla_length)
test$pvalue_plots$bpp_plot_over
```


```{r saveme}
pander::pander(sessionInfo())
message(paste0("This is hpgltools commit: ", get_git_commit()))
```
